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Abstract.  This  paper  deals  with  generalized  pattern  search  (GPS)  algorithms  for  linearly  constrained  optimization.  At  each 
iteration,  the  GPS  algorithm  generates  a  set  of  directions  that  conforms  to  the  geometry  of  any  nearby  linear  constraints.  This 
set  is  then  used  to  construct  trial  points  to  be  evaluated  during  the  iteration.  In  previous  work,  Lewis  and  Torczon  developed 
a  scheme  for  computing  the  conforming  directions,  but  it  assumed  no  degeneracy  near  the  current  iterate.  The  contribution  of 
this  paper  is  to  provide  a  detailed  algorithm  for  constructing  the  set  of  directions  whether  or  not  the  constraints  are  degenerate. 
One  difficulty  in  the  degenerate  case  is  in  classifying  constraints  as  redundant  and  nonredundant.  We  give  a  short  survey  of  the 
main  definitions  and  methods  for  treating  redundancy  and  propose  an  approach  to  identify  nonredundant  e-active  constraints, 
which  may  be  useful  for  other  active  set  algorithms.  We  also  introduce  a  new  approach  for  handling  nonredundant  linearly 
dependent  constraints,  which  maintains  GPS  convergence  properties  without  significantly  increasing  computational  cost.  Some 
simple  numerical  tests  illustrate  the  effectiveness  of  the  algorithm.  We  conclude  by  briefly  considering  the  extension  of  our  ideas 
to  nonlinear  constraints  with  linearly  dependent  constraint  gradients. 
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1.  Introduction.  This  paper  continues  the  development  of  generalized  pattern  search  (GPS)  algo¬ 
rithms  mm  for  linearly  constrained  optimization  problems 
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£,u  £  {R  U  {±oo}}m,  and  t  <  u.  As  is  evident,  (1.2 1  reduces  to  the  definition  in  jH|T5],  where  the  rth  row 
ar  of  the  matrix  A  is  equal  to  some  ith  row  af  of  the  matrix  AT  with  a  coefficient  of  +1  or  — 1,  and  bi  =  ur 
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We  target  the  case  when  the  function  f{x)  may  be  an  expensive  “black  box”,  provide  few  correct  digits, 
or  may  fail  to  return  a  value  even  for  feasible  points  x  £  Cl.  In  this  situation,  the  accurate  approximation 
of  derivatives  is  not  likely  to  be  practical.  GPS  algorithms  rely  on  simple  decrease  in  /(a;);  i.e.,  an  iterate 
Xk+i  £  Cl  satisfying  f(xk+ 1)  <  f{xu)  is  considered  successful. 

Lewis  and  Torczon  m  introduced  and  analyzed  the  generalized  pattern  search  for  linearly  constrained 
minimization  problems.  They  proved  that  if  the  objective  function  is  continuously  differentiable  and  if  the 
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set  of  directions  that  defines  a  local  search  is  chosen  properly  with  respect  to  the  geometry  of  the  boundary  of 
the  feasible  region,  then  GPS  has  at  least  one  limit  point  that  is  a  Karush-Kuhn- Tucker  point.  By  applying 
the  Clarke  nonsmooth  calculus  HU.  Audet  and  Dennis  j3]  simplified  the  analysis  in  m  and  introduced  a 
new  hierarchy  of  convergence  results  for  problems  with  varying  degrees  of  nonsmoothness.  Second-order 
behavior  of  GPS  is  studied  in  [2] . 

Generalized  pattern  search  algorithms  generate  a  sequence  of  iterates  {xk}  in  R"  with  non-increasing 
objective  function  values.  In  linearly  constrained  optimization,  a  set  of  directions  that  defines  the  so-called 
poll  step  must  conform  to  the  geometry  of  the  boundary  of  the  feasible  region.  The  key  idea,  which  was 
first  suggested  by  May  in  [18]  and  applied  to  the  GPS  in  [T6] ,  is  to  use  as  search  directions  the  generators  of 
cones  polar  to  those  generated  by  the  normals  of  faces  near  the  current  iterate. 

Lewis  and  Torczon  m  presented  an  algorithm  for  constructing  the  set  of  generators  in  the  nondegenerate 
case,  and  left  the  degenerate  case  for  future  work.  In  their  recent  work,  Kolda  et  al.  (T3J  mentioned  that  the 
problem  when  constraints  are  degenerate  has  been  well  studied  in  computational  geometry  and  the  solution 
to  the  problem  exists  in  HIS]-  However,  there  are  examples  when  the  method  proposed  in  GDIS]  requires  full 
enumeration,  which  can  be  cost-prohibitive. 

Price  and  Coope  [21j  gave  as  an  aside  a  result  that  can  be  used  for  constructing  a  set  of  generators  in 
the  degenerate  case.  It  follows  from  their  result  that,  in  order  to  construct  a  set  of  generators,  it  is  sufficient 
to  consider  maximal  linearly  independent  subsets  of  the  active  constraints.  However,  this  approach  implies 
enumeration  of  all  possible  linearly  independent  subsets  of  maximal  rank  and  does  not  take  into  account 
properties  of  the  problem  that  can  help  to  reduce  this  enumeration.  Price  and  Coope  |21|  outlined  an  algo¬ 
rithm  for  constructing  frames,  but  it  was  not  their  point  to  work  out  details  of  the  numerical  implementation 
in  the  degenerate  case. 

The  purpose  of  this  paper  is  to  give  detailed  consideration  to  GPS  in  the  degenerate  case  in  a  way  that  is 
complementary  to  [4j  and  [16] .  Our  main  result  is  a  detailed  algorithm  for  constructing  the  set  of  generators 
at  a  current  GPS  iterate  in  both  the  degenerate  and  nondegenerate  cases.  To  construct  the  set  of  generators 
in  the  degenerate  case,  we  identify  the  redundant  and  nonredundant  active  constraints  and  then  use  either 
QR  decomposition  or  a  construction  proposed  in  [13- 

Classification  of  constraints  as  redundant  or  nonredundant  is  one  of  the  main  issues  here,  because  it  is 
sufficient  to  construct  the  set  of  generators  only  for  nonredundant  constraints.  Several  methods  for  classifying 
constraints  exist.  For  example,  there  are  deterministic  algorithms  mm,  probabilistic  hit-and-run  methods 
[?i,  and  a  probabilistic  method  based  on  an  equivalence  between  the  constraint  classification  problem  and 
the  problem  of  finding  a  feasible  solution  to  a  set  covering  problem  [9] .  A  survey  and  comparison  of  strategies 
for  classifying  constraints  are  given  in  urn-  Any  of  these  approaches  can  be  applied  in  the  GPS  framework 
to  identify  redundant  and  nonredundant  constraints.  However,  in  the  paper,  we  propose  a  new  projection 
approach  to  identify  nonredundant  constraints  that  is  more  suitable  for  GPS  methods. 

The  projection  method  is  similar  to  the  hit-and-run  algorithm  [7],  in  which  nonredundant  constraints 
are  searched  for  along  random  direction  vectors  from  each  point  in  a  sequence  of  random  interior  points, 
but  differs  in  its  use  of  a  deterministic  direction.  The  major  advantage  of  the  projection  method  for  our 
application  is  that  the  number  of  direction  vectors  (in  the  terminology  of  the  hit-and-run  algorithm)  is  equal 
to  the  number  of  constraints  that  have  to  be  identified.  For  us  this  is  generally  a  small  number.  In  the  hit- 
and-run  algorithm,  this  number  is  determined  by  a  stop  criterion  and  can  be  large  if  many  of  the  randomly 
generated  directions  do  not  detect  a  nonredundant  constraint.  Moreover,  the  formulas  used  in  the  projection 
method  are  simpler  than  those  used  for  computing  the  intersection  points  of  a  direction  vector  with  the 
hyperplanes  in  the  hit-and-run  algorithm.  We  should  note  also  that  the  goal  of  hit-and-run  is  to  detect  all 
nonredundant  constraints  in  a  full  system  of  linear  inequalities.  We  use  the  projection  method  to  detect  the 
nonredundant  constraints  among  only  active  constraints  in  the  case  when  they  are  linearly  dependent.  As 
our  numerical  tests  show,  the  projection  method  cheaply  detects  all,  or  almost  all,  nonredundant  constraints. 

To  classify  constraints  not  detected  by  the  projection  method,  we  use  another  approach  outlined  in  EH- 
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As  a  result,  we  ensure  that  every  active  constraint  is  detected  as  either  redundant  or  nonredundant.  In  the 
worst  case,  we  may  have  linearly  dependent,  nonredundant  constraints.  We  propose  a  general  approach  for 
handling  this  case  with  an  accompanying  convergence  theorem,  along  with  two  specific  instances  that  can 
be  used  effectively  in  practice. 

In  the  end,  we  briefly  discuss  the  extension  of  our  ideas  to  optimization  problems  with  general  nonlinear 
constraints  that  are  linearly  dependent  at  a  solution.  We  do  so  by  applying  the  projection  method  to  a 
linearization  of  the  constraints,  and  we  argue  that  it  is  less  costly  than  applying  the  approach  of  uni. 

The  organization  of  the  paper  is  as  follows.  In  the  next  section,  we  give  a  brief  description  of  GPS  as 
well  as  the  convergence  result  for  linearly  constrained  minimization  following  papers  by  Audet  and  Dennis 
0,  and  by  Lewis  and  Torczon  [IB].  Section  [3]  is  devoted  to  the  topic  of  redundancy.  In  the  first  part  of  the 
section,  we  introduce  a  definition  of  the  £-active  constraints  and  discuss  some  scaling  issues.  The  second 
part  of  Section  [3]  contains  essential  definitions  and  results  on  redundancy  nni  mo  mo  na  that  are  required 
for  our  analysis.  Then  we  propose  our  projection  method  to  determine  nonredundant  constraints,  and  we 
briefly  describe  a  more  expensive  follow-up  approach  to  be  applied  if  some  constraints  are  not  identified  by 
the  projection  method.  In  Section  [4]  we  give  an  algorithm  for  constructing  the  set  of  generators  and  discuss 
implementation  details,  including  a  new  approach  for  handling  nonredundant  linearly  dependent  constraints 
in  a  rigorous  way  without  significantly  increasing  computational  cost.  In  Section  [5]  we  consider  the  extension 
of  our  ideas  to  nonlinearly  constrained  problems.  Section  [6]  is  devoted  to  some  concluding  remarks. 

Notation.  R,  Z,  and  N  denote  the  set  of  real  numbers,  integers,  and  nonnegative  integers,  respectively. 
For  any  finite  set  S',  we  may  refer  to  the  matrix  S  as  the  one  whose  columns  are  the  elements  of  S.  Similarly, 
for  any  matrix  A,  the  notation  a  €  A  means  that  a  is  a  column  of  A. 

2.  Generalized  pattern  search  algorithms.  In  this  section,  we  briefly  describe  the  class  of  GPS 
algorithms  for  linearly  constrained  minimization,  along  with  the  main  convergence  result.  We  follow  papers 
by  Audet  and  Dennis  jl]  and  by  Lewis  and  Torczon  and  we  refer  the  reader  there  for  details  of  managing 
the  mesh  size  A*,.  Throughout,  we  will  always  use  the  l 2  norm. 

GPS  algorithms  can  be  applied  either  to  the  objective  function  /  or  to  the  barrier  function  /q  =  f  +  if o  : 
R"  — >  Ru{+oo},  where  if n  is  the  indicator  function  for  Q,  which  is  zero  on  f l  and  00  elsewhere.  The  value  of 
fn  is  +00  on  all  points  that  are  either  infeasible  or  at  which  /  is  declared  to  be  +00.  This  barrier  approach 
is  probably  as  old  as  direct  search  methods  themselves. 

A  GPS  algorithm  for  linearly  constrained  optimization  generates  a  sequence  of  iterates  {xk}  in  f 1.  The 
current  iterate  xk  G  R"  is  chosen  from  a  finite  number  of  points  on  a  mesh ,  which  is  a  discrete  subset  of  Rn. 
At  iteration  At,  the  mesh  is  centered  around  the  current  mesh  point  (current  iterate)  Xk  and  its  fineness  is 
parameterized  through  the  mesh  size  parameter  Ak  >  0  as 

Mk  =  {xk  +  A kDz  :  ^  G  NnD},  (2.1) 


where  D  is  a  finite  matrix  whose  columns  form  a  set  of  positive  spanning  directions  in  Rra,  njj  is  the  number 
of  columns  of  the  matrix  D.  At  each  iteration,  some  positive  spanning  matrix  Dk  composed  of  columns  of 
D  is  used  to  construct  the  poll  set, 


Pk  =  { xk  +  Akd  '■  d  G  Dk}. 


(2.2) 


A  two-dimensional  mesh  and  poll  set  are  illustrated  in  Figure  2.1 


If  xk  G  is  not  near  the  boundary,  then  Dk  is  a  positive  spanning  set  for  Rn  m-  If  xk  G  is  near  the 
boundary,  the  matrix  Dk  is  constructed  so  its  columns  dj  also  span  the  cone  of  feasible  directions  at  xk  and 
conform  to  the  geometry  of  the  boundary  of  f 2.  Hence,  the  set  D  must  be  rich  enough  to  contain  generators 
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xk+Akd 

d&Dk 


Fig.  2.1.  A  mesh  and  poll  set  in  R2 


for  the  tangent  cone  Tq(x)  =  cl{/i(w  —  x)  :  p  >  0,  u  G  0}  for  every  x  G  O.  More  formally,  the  sets  Dk  must 
satisfy  the  following  definition. 

Definition  2.1.  A  rule  for  selecting  the  positive  spanning  sets  Dk  Q  D  conforms  to  f l  for  some  e  >  0, 
if  at  each  iteration  k  and  for  each  y  in  the  boundary  of  for  which  \\y  —  Xfe||  <  e,  To(y)  is  generated  by  a 
nonnegative  linear  combination  of  columns  of  Dk- 

Each  GPS  iteration  is  divided  into  two  phases:  an  optional  search  and  a  local  poll.  In  each  step,  the 
barrier  objective  function  is  evaluated  at  a  finite  number  of  mesh  points  in  an  attempt  to  find  one  that  yields 
a  lower  objective  function  value  than  the  incumbent.  We  refer  to  such  a  point  as  an  improved  mesh  point. 
If  an  improved  mesh  point  is  found,  it  becomes  the  incumbent,  so  that  f(xk+ 1)  <  f(xk).  The  mesh  size 
parameter  is  then  either  held  constant  or  increased. 

In  the  search  step,  there  is  complete  flexibility.  Any  strategy  may  be  used  (including  none),  and  the 
user’s  knowledge  of  the  domain  may  be  incorporated.  If  the  SEARCH  step  fails  to  yield  an  improved  mesh 
point,  the  poll  step  is  invoked.  In  this  second  step,  the  barrier  objective  function  is  evaluated  at  points  in 
the  poll  set  Pk  (i.  e.,  neighboring  mesh  points)  until  an  improved  mesh  point  is  found  or  until  all  the  points 
in  Pk  have  been  evaluated.  If  both  the  SEARCH  and  poll  steps  fail  to  find  an  improved  mesh  point,  then 
the  incumbent  is  declared  to  be  a  mesh  local  optimizer  and  is  retained  as  the  incumbent,  so  that  Xk+i  =  Xk- 
The  mesh  size  parameter  is  then  decreased.  Figure  [272]  gives  a  description  of  a  basic  GPS  algorithm. 

We  remind  the  reader  that  the  normal  cone  N^x)  to  at  x  is  the  nonnegative  span  of  all  the  outwardly 
pointing  constraint  normals  at  x  and  can  be  written  as  the  polar  of  the  tangent  cone:  Nq,(x)  =  {v  G  : 
Vw  G  Th(x),  vtoj  <  0}. 

Assumptions.  We  make  the  following  standard  assumptions  [3]: 

A1  A  function  /n  and  x0  G  R"  (with  fn(x0)  <  oo)  are  available. 

A2  The  constraint  matrix  A  is  rational. 

A3  All  iterates  {xp}  produced  by  the  GPS  algorithm  lie  in  a  compact  set. 

Under  these  assumptions,  Torczon  [22]  showed  that  liminf  A*,  =  0,  and  Audet  and  Dennis  [T|  identified 
the  following  subsequences,  for  which  the  limit  of  is  zero. 

Definition  2.2.  A  subsequence  of  mesh  local  optimizers  {xk}k&K  (for  some  subset  of  indices  K)  is 
said  to  be  a  refining  subsequence  if  {Ak}keK  converges  to  zero. 

Audet  and  Dennis  [3]  proved  the  following  convergence  results  for  GPS  in  the  linearly  constrained  case 
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Initialization: 

Let  Xq  be  such  that  fn(xo)  is  finite.  Let  D  be  a  positive  spanning  set,  and 
let  Mo  be  the  mesh  on  R”  defined  by  Ao  >  0,  and  Dq.  Set  the  iteration 
counter  k  =  0. 

Search  and  poll  step: 

Perform  the  SEARCH  and  possibly  the  poll  steps  (or  only  part  of  them) 
until  an  improved  mesh  point  Xk+i  with  the  lowest  /n  value  so  far  is  found 
on  the  mesh  defined  by  equation  (2.1 1. 

—  Optional  SEARCH:  Evaluate  /a  on  a  finite  subset  of  trial  points  on  the 
mesh  Mfc  defined  by  (2.1 1  (the  strategy  that  gives  the  set  of  points 
is  usually  provided  by  the  user;  it  must  be  finite  and  the  set  can  be 
empty). 

—  Local  POLL:  Evaluate  fn  on  the  poll  set  defined  in  (2.2  ( . 

Parameter  update: 

If  the  SEARCH  or  the  poll  step  produced  an  improved  mesh  point,  i.e.,  a 
feasible  iterate  Xk+i  £  M ;c  n  for  which  fn(xk+i)  <  fn(xk),  then  update 


Afc-|_i  ■">  Afc. 

Otherwise,  fn{%k )  <  fn{xk  +  Akd)  for  all  d  £  Dk  and  so  Xk  is  a  mesh  local 
optimizer.  Set  Xk+i  =  Xk ,  update  Afc+i  <  A^. 

Increase  k  <—  k  +  1  and  go  back  to  the  SEARCH  and  poll  step. 


Fig.  2.2.  A  simple  GPS  algorithm 


using  only  these  assumptions. 

Lemma  2.3.  Under  assumptions  A1-A3,  if  x  is  any  limit  of  a  refining  subsequence,  if  d  is  any  direction 
in  D  for  which  f  at  a  poll  step  was  evaluated  for  infinitely  many  iterates  in  the  subsequence,  and  if  f  is 
Lipschitz  near  x,  then  the  generalized  directional  derivative  of  f  at  x  in  the  direction  d  is  nonnegative,  i.e., 
f°{x-,d)  >  0. 

Theorem  2.4  (Convergence  to  a  Karush-Kuhn-Tucker  point).  |1]  Under  assumptions  A1-A3,  if  f  is 
strictly  differentiable  at  a  limit  point  x  of  a  refining  subsequence,  and  if  the  rule  for  selecting  positive  spanning 
sets  Dk  C  D  conforms  to  S2  for  some  e  >  0,  then  V f(x)Tuj  >  0  for  all  w  £  Tq(x),  and  so  — V/(x)  £  Nq(x). 
Thus,  x  is  a  Karush-Kuhn-  Tucker  point. 

The  purpose  of  this  paper  is  to  provide  an  algorithm  for  constructing  sets  Dk  that  conform  to  the 
boundary  of  f 1.  If  the  active  constraints  are  linearly  dependent,  we  apply  strategies  for  the  identification  of 
redundant  and  nonredundant  constraints,  which  are  described  in  the  next  section,  and  then  construct  sets 
Dk  taking  into  account  only  nonredundant  constraints.  We  now  pause  to  outline  the  main  results  concerning 
redundancy  from  mathematical  programming,  and  then  in  Section  [4]  we  continue  consideration  of  GPS  and 
strategies  for  constructing  the  sets  Dk- 


3.  Redundancy.  We  now  present  some  essential  definitions  and  results  concerning  redundancy  pQ  HU. 
nstnona  that  are  required  for  our  analysis.  Then  we  propose  our  approach,  the  projection  method,  to 
determining  the  nonredundant  constraints  and  briefly  describe  another  approach  that  is  applied  if  some 
constraints  are  not  identified  by  the  projection  method. 


We  consider  the  feasible  region  S2  defined  by  (1.2 1,  and  refer  to  the  inequality  aj  x  <  bj  as  the  j-th 
constraint.  The  region  represented  by  all  but  the  j th  constraint  is  given  by 


f lj  =  {x  £  Mn  :  ajx  <  bi,  i  £ 


where  I\{j}  is  the  set  /  with  the  element  j  removed. 
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The  following  definition  is  consistent  with  definitions  given  in  mm- 

Definition  3.1  (Redundant  constraint).  The  jth  constraint  ajx  <  bj  is  redundant  in  the  description 
of  Cl  if  and  only  if  Cl  =  Clj,  and  is  (necessarily)  nonredundant  otherwise. 

3.1.  e— active  constraints.  We  next  compare  two  definitions  of  e-active  constraints  and  discuss  some 
associated  scaling  issues.  They  are  replicated  from  [16j  and  |2T],  respectively. 

Definition  3.2.  ( e.g .,  m-  Let  some  scalar  e  >  0  be  given  and  Xk  £  Cl.  The  jth  constraint  is  e-active 
at  xk  if 

0  <  bj  —  ajxk  <  e.  (3.1) 


Definition  3.3.  (e.g.,  nn).  Let  some  scalar  e  >  0  be  given  and  Xk  £  Cl.  The  jth  constraint  is  e-active 
at  xk  if 


dist (xk,Hj)  <  e, 


(3.2) 


where  Hj  =  {x  £ 

IT. 


ajx  =  bj},  and  dist(ajfc,  Hj)  =  min  ||y  —  Xk\\  is  the  distance  from  Xk  to  the  hyperplane 


yen 


Clearly,  the  jth  constraint  can  be  made  e-active  at  Xk  in  the  sense  of  Definition  |3.2|  by  multiplying  the 
inequality  bj  —  aj  Xk  >  0  by  a  sufficiently  small  number.  On  the  other  hand,  this  multiplication  does  not 

In  the  paper,  we  prefer  to 


change  the  distance  between  the  point  Xk  and  any  Hj  defined  in  Definition  3.3 
use  Definition  13.21  since  it  is  easier  to  check  than  Definition  13.3 


However,  Definition  |3.2|  is  proper,  if  we 
assume  preliminary  scaling  of  the  constraints  so  that  the  following  lemma  applies. 


Lemma  3.4.  Let  some  scalar  e  >  0  be  given,  Xk  £  Cl,  and  ||ajj|  =  1  for  all  j  £  I  in  (1.2).  Then,  for  any 
£  I,  Definition  3.2  of  the  e-active  constraint  is  equivalent  to  Definition  3.3  and  the  projection  Pj(xk)  of 


the  point  Xk  onto  the  hyperplane  Hj  =  {x  £  R™  :  ajx  =  bj}  is  defined  by 


Pj(Xk)  —  X k  T  ®j(bj  a~  Xk). 


(3.3) 


Proof.  For  any  j  £  /,  the  distance  from  Xk  to  the  hyperplane  Hj  is  given  by 


dist  (xk,Hj)  = 


I  bj  -  ajxk\ 


(3.4) 


Hence,  if  1 1 a,y 1 1  =  1  and  Xk  £  Cl,  (3.1 1  is  equivalent  to  (3.2 1. 

By  definition  of  the  projection  of  Xk  onto  Hj, 

||-fj(*£fc)  %k\\  dist  (x  k ,  Hj) . 

Since  Xk  £  Cl  and  1 1 a, y  1 1  =  1,  it  follows  from  (|3.4|)  that  dist(xfc,  Hj)  =  bj  —  ajxk  and 


Pj(xk)  =  xk  +  aj  dist (xk,Hj)  =  xk  +  aj(bj  -  ajxk). 


Hence,  (3.3)  holds.  □ 


To  satisfy  the  conditions  of  Lemma  3.4  we  introduce  the  matrix  A  and  vector  b  that  are  additional 
scaled  copies  of  A  and  b,  respectively  from  (1.2 1,  such  that 


bi  = 


i  £  I. 
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(3.5) 


Consequently,  ||aj||  =  1  for  all*  £  I  and  SI  =  {a;  £  Mn  :  ATx  <  b}  =  {a;  £  R"  :  ATx  <  6}  =  {x  £  K"  : 
afx  <bi,  i  £  /}. 

We  then  use  A  and  b  to  define  the  set  of  indices  of  the  e-active  constraints  as 


I(xk,e)  =  {i£  I  :  0  <  bz  -  af  xk  <  e}, 


(3.6) 


and  we  apply  the  projection  method  for  detection  of  the  nonredundant  constraints  (see  Section  3.3.1  for 
more  details.)  We  refer  to  the  set  I(xk,e)  as  the  working  index  set  at  the  current  iterate  xk. 


This  paper  also  makes  use  of  the  regions  given  by 

ft(xk,s)  =  {x£i"  :  afx  <  h,  i  £  I(xk,e)}, 


(3.7) 


and 


ttj(xk,e)  =  {x  £  Rn  :  afx<b.t,  i  £  I(xk,  e)\{j}},  j  £  I(xk,e). 

Clearly,  f 1  C  f l(xk,e)  C  f lj(xk,s).  Furthermore,  since  f 1  C  fl(a:fc,£),  if  the  jth  constraint  is  redundant  in  the 
description  of  f2(xfc,e),  it  is  also  redundant  in  the  description  of  SI. 


Fig.  3.1.  An  illustration  of  e- active  and  redundant  constraints.  Constraints  1,  2,  and  3  are  e-active  at  the  current  iterate 
x  and  constraint  2  is  redundant. 


3.2.  Redundancy  in  mathematical  programming.  We  now  give  definitions  and  theorems  consis¬ 
tent  with  the  mathematical  programming  literature  nmnniMHiim  We  begin  with  the  following  definitions, 
which  can  be  found  in  mug.  In  the  discussion  that  follows,  we  use  notation  consistent  with  that  of  Section 
[l]  (see  (1.2)  and  the  discussion  that  follows  it). 

Definition  3.5  (Polyhedron).  A  subset  ofW1  described  by  a  finite  set  of  linear  constraints  P  =  {x  £ 
Rn  :  CT x  <  d}  is  a  polyhedron. 


Obviously,  S7  given  by  (1.2 1  and  il(xk,£)  given  by  (3.7 1  are  polyhedra. 


Definition  3.6.  The  points  xx,...,xk  £  Rn  are  affinely  independent  if  the  k  —  1  directions  x 2  — 
x1, . . . ,  xk  —  x1  are  linearly  independent ,  or  alternatively,  the  k  vectors  (x1, 1), . . . ,  ( xk ,  1)  £  Mn+1  are  linearly 
independent. 


We  will  assume  that  SI  is  full-dimensional,  as  defined  below. 

Definition  3.7.  The  dimension  of  P,  denoted  dim(P),  is  one  less  than  the  maximum  number  of  affinely 
independent  points  in  P.  Then  P  C  Rn  is  full-dimensional  if  and  only  i/dim(P)  =  n. 

Note  that,  if  S7  were  not  full-dimensional,  then  a  barrier  GPS  approach  would  not  be  a  reasonable 
way  to  handle  linear  constraints  because  it  would  be  difficult  to  find  any  trial  in  S7.  Since  we  assume  SI  is 
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full-dimensional,  this  implies  that  its  supersets  f l{xk,e)  and  Llj(xk,e)  are  full-dimensional. 

Definition  3.8  (Valid  inequality).  An  inequality  cj  x  <  dj  is  a  valid  inequality  for  P  C  Rn  if  cj x  <  dj 
for  all  x  £  P. 

Definition  3.9  (Face  and  Facet),  (i)  F  defines  a  face  of  the  polyhedron  P  if  F  =  {x  £  P  :  cjx  =  dj} 
for  some  valid  inequality  cj  x  <  dj  of  P.  F  7^  0  is  said  to  be  a  proper  face  of  P  if  F  ^  P. 

(ii)  F  is  a  facet  of  P  if  F  is  a  face  of  P  and  dim(F)  =  dim(P)  —  1. 

Definition  3.10  (Interior  point).  A  point  x  £  P  is  called  an  interior  point  of  P  if  CTx  <  d. 

We  also  need  the  following  results  from  integer  programming  j25[  pp.  142-144]  and  [19],  pp.  85-92]. 

Proposition  3.11.  [19]  Corollary  2.5]  A  polyhedron  is  full- dimensional  if  and  only  if  it  has  an  interior 
point. 

Theorem  3.12.  [25;  Theorem  9.1]  If  P  is  a  full- dimensional  polyhedron,  it  has  a  unique  minimal 

description 

F  =  {iel"  :  cfx<di ,  i  =  l,...,m}, 


where  each  inequality  is  unique  to  within  a  positive  multiple. 


Corollary  3.13.  [25]  Proposition  9.2]  If  P  is  full- dimensional,  a  valid  inequality  cj  x  <  dj  is  necessary 
in  the  description  of  P  if  and  only  if  it  defines  a  facet  of  P. 


Corollary  3.13  means  that  the  following  concepts  are  equivalent  for  f2(xfc,e)  defined  in  (3.7). 


•  The  jth  inequality  aj x  <  bj  defines  a  facet  of  fi(xfc,e). 

•  The  jth  inequality  ajx  <  bj  is  necessary  (nonredundant)  in  description  of  f l(xk,s),  or  in  other 
words, 


Sl(a;fc, er)  C  Clj(xk,e).  (3.8) 

Our  approach  for  identifying  nonredundant  constraints  is  based  primarily  on  the  following  proposition. 

Proposition  3.14.  Let  a  working  index  set  I(xk,s)  be  given.  An  inequality  ajx  <  bj,  j  £  I(xk,e),  is 
nonredundant  in  the  description  ofH{xk,s)  if  and  only  if  either  I(xk,e)  =  {j}  or  there  exists  x  £  Rra  such 
that  ajx  =  bj  and  ajx  <  bi  for  all  i  £  J(xfc,e)\{j}. 

Proof.  Since  the  case  I(xk,£ )  =  {j}  is  trivial,  we  give  the  proof  for  the  case  when  I{xk,e)\{j}  ^  0. 

Necessity.  Since  the  inequality  ajx  <  bj  is  nonredundant,  then,  by  (3.8 1,  there  exists  x*  £  R"  such 

there  exists  an  interior  point 


3.11 


that  ajx*  <  bi  for  all  i  £  I(xk,e)\{j},  and  ajx*  >  bj.  By  Proposition 

x  £  Ll(xk,s)  such  that  ajx  <  bi  for  all  i  £  I(xk,e).  Thus  on  the  line  between  x*  and  x  there  is  a  point 
x  £  R"  satisfying  aj x  =  bj  and  ajx  <  bi  for  all  i  £  I(xk,e)\{j}- 

Sufficiency.  Let  x  £  fl(xk,s)  be  an  interior  point,  i.e.,  ajx  <  bi  for  all  i  £  I{xk,£).  Since  there  exists 
x  £  Rn  such  that  ajx  =  bj  and  ajx  <  bi  for  all  i  £  I(xk,s)\{j},  then  there  exists  5  >  0  such  that 
x  =  x  +  S(x  —  x)  satisfies  ajx  >  bj  and  ajx  <bi,i£  I(xk,s)\{j}.  Therefore,  ( 3.8 )  holds,  and  by  Definition 


3.1  the  jth  constraint  is  nonredundant.  □ 


Proposition  3.14  means  that  if  the  jth  constraint,  j  £  I(xk,e),  is  nonredundant,  then  there  exists  a 
feasible  point  x  £  Ll(xk,s)  such  that  only  this  constraint  holds  with  equality  at  x. 

Our  approach  for  identifying  redundant  constraints  is  based  primarily  on  the  following  theorem  [ID], 

Theorem  3.15.  The  jth  constraint  is  redundant  in  system  ( 1.2 1  if  and  only  if  the  linear  program, 

(3.9) 


maximize  dj  x , 


subject  to  x  €  Qj 


'j  ’ 


has  an  optimal  solution  x*  such  that  ajx *  <  bj. 


3.3.  Approaches  for  identifying  redundant  and  nonredundant  constraints.  We  now  outline 
two  approaches  for  identifying  redundancy  in  the  constraint  set:  a  projection  method  for  identifying  nonre¬ 
dundant  constraints  and  a  linear  programming  (LP)  approach  for  identifying  redundant  ones.  The  LP 
approach,  which  is  based  on  Theorem  3.15  is  described  in  m ■  In  Section  |4j  we  will  explain  in  more  detail 
how  these  ideas  are  implemented  in  the  class  of  GPS  algorithm  for  linearly  constrained  problems,  even  in 
the  presence  of  degeneracy. 

3.3.1.  A  projection  method.  The  main  idea  of  the  projection  method  we  propose  is  the  construction, 
if  possible,  of  a  point  x  such  that  ajx  =  bj  and  afx  <  bi  for  all  i  £  I(xk,s)\{j}-  If  such  a  point  x  exists, 
then  by  Proposition  |3.14[  the  jtli  constraint  is  nonredundant. 


Recall  that  we  defined  in  (3.5 1  a  scaled  copy  A  of  the  matrix  A  and  a  scaled  vector  b.  We  denote  by 
Pj(xk),  the  projection  of  Xk  £  R”  onto  the  hyperplane  Hj  =  {x  £  S 
Then  by  (3.3 1  and  by  1 1 ay  1 1  =  1, 


-T 

a 


x  =  bj}.  Assume  that  Xk  £  A. 


Pj(xk)  =  xk  +  a,j(bj  -  a}  xk). 


(3.10) 


The  following  proposition  is  the  main  one  for  the  projection  method. 

Proposition  3.16.  Let  Xk  £  Ll  and  let  a  working  index  set  I(xk,£)  be  given.  An  inequality  aj x  <  bj, 
j  £  I{xk,s),  is  nonredundant  in  the  description  ofLt(xk,e)  if 


a-  Pj{xk)  <  bi  for  all  i  £  I(xk,e)\{j}, 

where  Pj(xk)  is  a  projection  of  xk  onto  Hj. 

Proof.  The  proof  follows  from  Proposition  |3.14  □ 


(3.11) 


Proposition  3.16  allows  us  to  very  quickly  classify  the  jtli  constraint  as  nonredundant  if  (3.111  holds  for 
all  i  £  I(xk,  s)\{j},  where  Pj{xk)  in  (3.111  is  obtained  from  (3.10l.  The  only  drawback  is  that  it  identifies 


nonredundant  constraints  and  not  redundant  ones. 

3.3.2.  The  linear  programming  approach.  If  some  constraints  have  not  been  identified  by  the 


projection  method,  we  can  apply  another  approach  based  on  Theorem  3.15  to  identify  redundant  and  nonre¬ 
dundant  constraints.  It  follows  from  Theorem  13. 151  that  all  redundant  and  nonredundant  constraints  could 


be  conclusively  identified  by  solving  n  LP  problems  of  the  form  given  in  (3.9 1.  While  doing  so  is  clearly  more 


expensive  than  the  projection  method  given  in  Section  |3.3.1|  it  could  be  accomplished  during  the  initializa¬ 
tion  step  of  GPS  (i.e.,  before  the  GPS  iteration  sequence  begins),  at  a  cost  of  solving  n  LP  problems.  This 
is  possible  because  redundancy  of  linear  constraints  is  independent  of  the  location  of  the  current  iterate. 
However,  the  projection  method  could  be  advantageous  when  many  linear  constraints  are  present  (which  is 
often  the  case  with  redundant  constraints),  or  when  dealing  with  linear  constraints  formed  by  linearizing 
nonlinear  ones.  In  the  latter  case,  redundancy  would  depend  upon  location  in  the  domain,  since  the  linear 
constraints  would  change  based  on  location. 

The  book  [T3]  describes  different  methods  in  the  context  of  the  LP  approach.  They  include  some  very 
special  propositions  involving  slack  variables  that  simplify  and  reduce  the  computational  cost  of  the  numerical 
solution  of  the  LP  problem  (3.9 1 .  We  refer  the  reader  to  pjjj  for  a  more  detailed  discussion  of  these  issues. 


4.  Construction  of  the  set  of  generators.  The  purpose  of  this  section  is  to  provide  a  detailed 
algorithm  for  constructing  the  set  of  directions  Dk  introduced  in  Section  [2]  even  in  the  presence  of  degenerate 
constraints. 


Let  some  scalar  e  >  0  be  given,  and  let  aj  be  the  ith  row  of  the  matrix  AT  in  (3.5 1 .  At  the  current 
iterate  xk,  we  construct  the  working  index  set  I{xk,s)  such  that 


0  <  bi  —  af  xk  <  £ 
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i  £  I{xk,£ )■ 


The  last  inequality  means  that  every  constraint  that  is  active  at  Xk  or  at  some  point  near  Xk  appears  in 
I{xk,e).  In  g],  the  authors  suggest  not  setting  e  so  small  that  is  made  small  by  approaching  the  boundary 
too  closely  before  including  conforming  directions  that  allow  the  iterates  to  move  along  the  boundary  of  f 1. 
A  good  discussion  of  how  to  choose  e  can  be  found  in  g3| . 

Without  loss  of  generality,  we  assume  that  I{xk ,  s)  =  {1, . . . ,  m},  for  m  >  2.  This  avoids  more  cumber¬ 
some  notation,  like  I(xk,s)  =  {h(xk,  e),  ■  •  • ,  im{xk,  e)}-  Furthermore,  we  denote  by  Bk ,  the  matrix  whose 
columns  are  the  columns  of  A  corresponding  to  the  indices  I(xk ,  e)  =  {1, . . . ,  m};  i.e., 

Bk  =  [otl ,  •  *  •  ,  dm]  ■  (4. 1) 


4.1.  Classification  of  degeneracy  at  the  current  iterate.  Let  the  matrix  Bk  be  defined  by  (4.1 1. 
At  the  current  iterate  Xk,  the  matrix  Bk  satisfies  one  of  the  following  conditions: 


•  nondegenerate  case:  Bk  has  full  rank; 

•  degenerate  redundant  case:  Bk  does  not  have  full  rank  and  the  nonredundant  constraints  are  linearly 
independent; 

•  degenerate  nonredundant  case:  Bk  does  not  have  full  rank  and  the  nonredundant  constraints  are 
linearly  dependent. 


The  last  condition  is  illustrated  by  following  example  provided  to  us  by  Charles  Audet. 


Example  2.  Suppose  that  the  feasible  region  f 1  (see  ( 1 .2 1 ) ,  shown  in  Figure  4.1  is  defined  by  the  following 
system  of  inequalities: 


X\  —  2X2  —  2x3 

< 

0 

—  2X!  +  X2  —  2X3 

< 

0 

— 2x\  -  2x2  +  X3 

< 

0 

Xi 

> 

0 

X2 

> 

0 

X3 

> 

0 

(4.2) 


If  Xk  £  R3  is  near  the  origin,  all  six  constraints  are  active,  linearly  dependent,  and  nonredundant.  The 
matrix  Bk  is  given  as 

/  1  -2  -2  -1  0  0  \ 

Bk  =  -2  1  -2  0  -1  0  . 

\  -2  -2  1  0  0  -1  / 


Fig.  4.1.  Example  2.  An  illustration  of  the  degenerate  nonredundant  case. 
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4.2.  Set  of  generators.  Following  [16] ,  we  define  the  cone  K(xk,e )  as  the  cone  generated  by  the 
normals  to  the  e-active  constraints,  and  K°(xk,e)  as  its  polar: 


K°(xk,e)  =  afw  <0  Vi  G  I(xk,e)}. 


(4.3) 


This  cone  can  also  be  expressed  as  a  finitely  generated  cone 
definition. 


.  To  see  this,  first  consider  the  following 
Definition  4.1  (Set  of  generators).  A  set  V  =  {it,  . . .  ,  i;r}  is  called  a  set  of  generators  of  the  cone  I\ 


defined  by  (4.3)  if  the  following  conditions  hold: 


1.  Every  vector  v  G  K  can  be  expressed  as  a  nonnegative  linear  combination  of  vectors  in  V. 

2.  No  proper  subset  ofV  satisfies  [7] 

Thus,  given  Definition  |4.1[  we  can  express  K°{xk,e)  as 


K°( xk,e)  =  {w  £ 


w 


-  X) 
3=  1 


Xj  0,  Vj  G 


j  =  l,...,r}, 


(4.4) 


where  the  V  =  (iq,  . . . ,  tv}  is  the  set  of  generators  for  K°(xk,  e). 

The  key  idea,  which  was  first  suggested  by  May  in  [TS]  and  applied  to  GPS  in  m,  is  to  include  in  Dk  the 
generators  of  the  cone  K°(xk,  e).  Hence,  the  problem  of  construction  of  the  set  Dk  reduces  to  the  problem  of 
constructing  generators  {tv, . . . ,  vr}  of  the  cone  K°(xk,  e)  and  then  completing  them  to  a  positive  spanning 
set  for  R". 

The  following  proposition  means  that  it  is  sufficient  to  construct  the  set  of  generators  only  for  nonre- 
dundant  constraints. 

Proposition  4.2.  Let.  I(xk,e)  be  the  set  of  indices  of  e-active  constraints  at  xk  G  Rn.  Let  I]si{xk,e)  C 
J( xk,e)  be  the  subset  of  indices  of  the  nonredundant  constraints  that  define  f 1(xk,e).  Let  the  cone  K°(xk,e) 


be  defined  by  (4.3 1  and  let  the  cone  K^(xk,e)  be  given  by 

K°N(xk,e)  =  {i»£t”  :  ajw  <0  V*  G  IN{xk,e)}. 

If  {v i, . . .  ,vp}  is  a  set  of  generators  for  Kfr(xkle),  then  it  is  also  a  set  of  generators  for  K°(xk,e). 

Proof.  The  proof  of  this  proposition  follows  from  Corollary  |3.13|  □ 

Pattern  search  methods  require  that  iterates  lie  on  a  rational  lattice  m ■  To  ensure  this,  Lewis  and 


Torczon  m  placed  an  additional  requirement  that  the  matrix  of  constraints  AT  in  (|1.2[)  is  rational.  Under 
this  requirement,  Lewis  and  Torczon  m  showed,  in  the  following  theorem,  that  it  is  always  possible  to  find 
rational  generators  for  the  cones  K°(xk,e),  which,  with  the  rational  mesh  size  parameter  A*,  ensures  that 
GPS  iterates  lie  on  a  rational  lattice. 

Theorem  4.3.  If  K  is  a  cone  with  rational  generators  V,  then  there  exists  a  set  of  rational  generators 
for  K° . 

Moreover,  for  the  case  of  linearly  independent  active  constraints,  Lewis  and  Torczon  [16]  proposed 
constructing  the  set  of  generators  for  all  the  cones  K°{xk,e),  0  <  e  <  S,  as  follows: 

Theorem  4.4.  Suppose  that  for  some  S,  K(x,S )  has  a  linearly  independent  set  of  rational  generators 
V .  Let  N  be  a  rational  positive  basis  for  the  null  space  of  VT .  Then ,  for  any  e,  0  <  e  <  5,  a  set  of  rational 
generators  for  K°(x,e)  can  be  found  among  the  columns  of  N,  V{VTV)~1 ,  and  —  V{VTV)~l . 

The  matrix  N  can  be  constructed  by  taking  columns  of  the  matrices  ±(7  —  V (VTV)~1VT)  [16! . 


Recall  that  we  use  the  scaled  matrix  A  defined  in  (3.5 1  to  determine  £-active,  redundant,  and  nonredun¬ 


dant  constraints.  Then  we  use  the  result  stated  in  Theorem  |4.4|  together  with  rational  columns  of  A,  which 
correspond  to  the  nonredundant  and  s-active  constraints,  to  obtain  a  set  of  rational  generators. 
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A  set  of  generators,  which  may  be  irrational  in  exact  arithmetic,  can  also  be  found  by  using  the  QR 
factorization  of  the  matrix  V .  The  following  corollary  shows  how  to  use  the  QR  factorization  of  V  to 
construct  the  generators  for  all  the  cones  K°(xk,s),  0  <  e  <  5.  Recall  that  the  full  QR  factorization  of  V 
can  be  represented  as 


V  —  [  Qi  Q2 


R\  R2 

0  0 


(4.5) 


where  R 1  is  upper  triangular  and  rank(i?i)  =  rank(P),  and  the  columns  of  Q\  form  an  orthonormal  basis 
for  the  space  spanned  by  the  columns  of  V,  while  the  columns  of  Q2  constitute  an  orthonormal  basis  for  null 
space  of  VT . 

Corollary  4.5.  Suppose  that  for  some  S,  K(x,S)  has  a  linearly  independent  set  of  rational  generators 
V.  Then,  for  any  e,  0  <  e  <  6,  a  set  of  generators  for  K°(x,e)  can  be  found  among  the  columns  of  Q2, 
QiRi(R^[ and  —  QiRi(Rj 

Proof.  By  substituting  V  =  QR  and  using  the  properties  of  the  matrices  in  the  QR  factorization,  we 
obtain 


V{V  V)~l  =  QR((QRy  (QR))_i  =  QR(R  Q1  QR)~l  =  QR{R  R) 


\ — 1 


(4.6) 


By  applying  Theorem  4.4  and  by  taking  into  account  that  columns  of  Q2  span  the  null  space  of  VT , 
obtain  the  statement  of  the  corollary.  □ 


we 


From  the  theoretical  point  of  view,  a  set  of  generators  obtained  by  using  Corollary  |4.5|may  be  irrational 
since  an  implementation  of  the  QR  decomposition  involves  calculation  of  square  roots.  This  would  violate 
theoretical  assumptions  required  for  convergence  of  pattern  search.  However,  since  V  is  rational,  both  sides 


of  (4.6 1  must  also  be  rational.  Therefore,  by  Corollary  4.5  any  generators  with  irrational  elements  would 


be  found  in  the  matrix  Q 2 .  But  in  the  degenerate  case,  Q 2  will  often  be  empty,  since  it  represents  a  positive 
spanning  set  for  the  null  space  of  VT ,  and  most  examples  of  degeneracy  occur  when  the  number  of  e-active 
constraints  exceeds  the  number  of  variables.  Furthermore,  since  we  use  floating  point  arithmetic  in  practice, 
irrational  generators  would  be  represented  as  rational  approximations.  This  has  the  effect  of  generating 
a  slightly  different  cone.  Thus,  it  would  be  enough  to  ensure  convergence,  but  to  a  stationary  point  of  a 
slightly  different  problem.  However,  the  error  experienced  in  representing  an  irrational  number  as  rational 
is  probably  smaller  than  the  typical  roundoff  error  associated  with  LU  factorization. 


4.3.  The  nonredundant  degenerate  case.  Perhaps  the  most  difficult  case  to  handle  is  the  one 
in  which  the  e-active  constraints  at  Xk  are  nonredundant,  but  linearly  dependent.  This  can  happen,  in 
particular,  when  there  are  more  e-active  constraints  than  variables,  as  is  the  case  in  Example  |4.2|  The 
difficulty  of  this  case  lies  in  the  fact  that  the  number  of  directions  required  to  generate  the  tangent  cone  can 
become  large. 

Let  Sk  =  {ai,  02, . . . ,  ap}  denote  the  set  of  vectors  corresponding  to  the  e-active  nonredundant  constraints 
at  Xk.  Price  and  Coope  [21]  showed  that,  in  order  to  construct  Dk ,  it  is  sufficient  to  identify  the  tangent  cone 
generators  of  all  maximally  linearly  independent  subsets  of  Sk-  For  S/.  with  r  =  rank(Sfc),  we  can  estimate 
the  number  s  of  these  subsets,  by 


p! 

r\(p  —  r)! 


(4.7) 


Thus,  in  order  to  identify  the  entire  set  of  tangent  cone  generators,  we  would  have  to  consider  s  different  sets 
of  positive  spanning  directions,  where  s  could  become  quite  large.  While  there  are  some  vertex  enumeration 
techniques  [Q  (mentioned  in  jT5])  that  could  be  useful,  we  now  present  a  more  practical  approach  -  first  in 
general,  and  then  followed  by  some  specific  instances  that  can  be  implemented  in  practice. 
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4.3.1.  Partially  conforming  generator  sets.  In  our  approach,  we  choose  a  subset  of  r  linearly  inde¬ 
pendent  elements  of  Sk  and  store  them  as  columns  of  Bk .  Based  on  the  methods  described  in  Section  [3.3.11 
we  can  construct  a  set  of  generators  for  the  cone  defined  only  by  a  subset  of  the  constraints  represented  in 
Sk-  Furthermore,  we  require  Bk  to  change  at  each  iteration  so  that,  in  the  limit,  each  constraint  that  is 
active  at  the  limit  point  x  has  been  used  infinitely  often  in  constructing  directions.  Since  the  set  of  tangent 
cone  generators  is  finite,  the  ordering  scheme  for  ensuring  this  is  straightforward. 


This  approach  is  essentially  equivalent  to  using  all  the  tangent  cone  generators,  except  that  it  is  spread 
out  over  more  than  one  iteration.  The  advantage  is  that  it  keeps  the  size  of  the  poll  set  no  larger  than  it 
would  be  in  the  nondegenerate  case.  However,  the  drawback  is  that  we  no  longer  have  a  full  set  of  directions 
that  conform  to  the  geometry  of  f l,  which  is  an  important  hypothesis  in  the  statement  of  Theorem |2.4| 


The  proof  of  Theorem  2.4  given  in  [Tj,  relies  on  two  crucial  ideas;  namely,  Lemma  2.3  and  the  use 
of  conforming  directions.  Under  the  proposed  method  of  handling  degenerate  nonredundant  constraints, 


Lemma  |2.3|  still  applies,  but  Theorem  2 A  cannot  be  applied,  since  not  all  the  tangent  cone  generators  are 
used  at  each  iteration.  We  introduce  the  following  theorem,  which  establishes  the  same  result  as  Theorem|2.4| 
but  with  a  different  hypothesis  and  a  proof  that  is  essentially  identical  (see  U). 


Theorem  4.6.  Let  x  G  LI  be  the  limit  point  of  a  refining  subsequence  {xk}keK-  Under  Assumptions 
A1-A3,  if  f  is  strictly  differentiable  at  x  and  all  generators  of  the  tangent  cone  Xh(£)  are  used  infinitely  often 
in  K,  then  X7  f(x)Tco  >  0  for  all  lo  G  Tq(x),  and  so  — V/(x)  G  Nq(£).  Thus,  x  is  a  Karush- Kuhn- Tucker 
point. 


Proof.  Lemma  2.3  and  the  strict  differentiability  of  /  at  x  ensure  that  V/( x)1  d  >  0  for  all  d  G  DCiTq(x). 
Since  D  includes  all  the  tangent  cone  generators,  and  each  is  used  infinitely  often  in  K ,  it  follows  that  every 
to  G  Tq(x)  can  be  represented  as  a  nonnegative  linear  combination  of  D  ft  Tq (x) :  thus,  S7 f{x)T to  >  0  for  all 
to  G  Tn(x).  To  complete  the  proof,  we  multiply  both  sides  by  —1  and  conclude  that  —Vf(x)  G  Nq(x).  □ 


4.3.2.  Generating  directions.  While  the  new  hypothesis  of  Theorem  |4.6|  is  weaker  and  makes  the 
result  more  general  than  Theorem  |2.4|  it  is  more  difficult  to  enforce.  The  enumeration  scheme  mentioned 
above  will  ensure  that  the  tangent  cone  generators  get  used  infinitely  often,  but  it  cannot  ensure  that  they 
get  used  infinitely  often  in  the  refining  subsequence.  Thus  we  make  the  following  additional  assumption. 

A4:  Any  direction  used  infinitely  often  is  also  used  infinitely  often  in  any  refining  subsequence. 

This  assumption  is  actually  quite  mild,  since  a  direction  that  does  not  satisfy  A4  would  always  be 
successful  (infinitely  often)  after  a  finite  number  of  iterations. 

The  following  result  establishes  an  important  connection  between  the  constraints  and  tangent  cone 
generators. 

Lemma  4.7.  Let  x  be  the  limit  of  a  subsequence  of  GPS  iterates,  and  let  S  and  D  be  the  sets  of  active 
constraints  and  tangent  cone  generators,  respectively,  at  x.  If  every  constraint  in  S  is  used  to  form  tangent 
cone  generators  infinitely  often  in  the  subsequence,  then  every  tangent  cone  generator  in  D  is  also  used 
infinitely  often  in  the  same  subsequence. 

S 

Proof.  Let  Sj,  j  =  1  ,...,s  be  maximally  linearly  independent  subsets  of  S,  such  that  S  =  U  4 

3=1 

Furthermore,  let  D(Sj)  denote  the  set  of  tangent  cone  generators  produced  by  only  the  constraints  in  Sj. 

S 

Price  and  Coope  [21,  show  that  D  C  [J  D(Sj).  Thus,  if  Sj  is  used  infinitely  often,  then  D(Sj)  is  used 

i=1 

infinitely  often,  and  if  every  Sj,  j  =  l,...,s,  is  used  infinitely  often,  then  every  direction  in  D  is  used 
infinitely  often.  □ 


We  now  give  two  examples  of  approaches  that  generate  directions  satisfying  the  hypotheses  of  Theo¬ 


rem 


4.6  followed  by  convergence  theorems  for  each. 
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Random  Selection:  Randomly  select  (with  uniform  probability)  r  linearly  independent  e-active  con¬ 
straints  to  form  tangent  cone  generators. 


Sequential  Selection:  Order  the  s  subsets  of  r  linearly  independent  elements  of  Sk  as  Si,  i  =  1, . . . ,  s, 
and  use  subset  Sv  j  =  1  +  k  mod  s,  at  iteration  k. 

Theorem  4.8.  Let  x  be  the  limit  of  a  refining  subsequence  {xk}keK,  in  which  the  set  of  nonredundant 
binding  constraints  at  x  is  linearly  dependent.  If  search  directions  are  obtained  by  Random  Selection  whenever 
the  elements  of  Sk  are  linearly  dependent,  then  with  probability  1,  all  tangent  cone  generators  at  x  will  be 
used  infinitely  often  in  K . 


Proof.  For  any  nonredundant  active  constraint  at  x ,  let  Pk  denote  the  probability  that  the  constraint  is 
randomly  selected  at  iteration  k.  Then  for  sufficiently  large  k,  the  set  Sk  is  fixed  with  p  elements  (cor¬ 
responding  to  the  active  constraints  at  x),  and  Pk  =  A  Then  the  probability  that  the  constraint  is 

selected  infinitely  often  in  any  infinite  subsequence  M  of  iterates  (with  sufficiently  large  k)  is  equal  to 

p  —  r 


i  _  n  (i  -  pfc) = 1  -  n 


keM 


keM 


p 


=  1.  The  result  then  follows  from  Lemma 


4.7 


□ 


Theorem  4.9.  Let  x  be  the  limit  of  a  refining  subsequence  {xk}keK >  in  which  the  set  of  nonredundant 
binding  constraints  at  x  is  linearly  dependent.  Under  assumption  Af,  if  search  directions  are  obtained  by 
Sequential  Selection  whenever  the  elements  of  Sk  are  linearly  dependent,  then  all  tangent  cone  generators  at 
x  will  be  used  infinitely  often  in  K. 


Proof.  Since  subset  Sj,  j  =  1  +  k  mod  s  is  used  at  iteration  k,  Sj  is  used  infinitely  often  in  the  iteration 
sequence.  Furthermore  Sj  C  S  for  all  sufficiently  large  k,  where  S  is  the  set  of  active  constraints  at  x.  The 
result  follows  from  Assumption  A4  and  Lemma  [4. 7|  □ 

In  each  of  these  two  instances,  something  is  sacrificed  for  the  sake  of  implementation  -  either  a  weaker 
convergence  result  (Random  Selection)  or  the  additional  assumption  A4  (Sequential  Selection).  However, 
if  function  evaluations  are  expensive,  the  alternative  of  identifying  all  the  tangent  cone  generators  at  each 
iteration  will  become  intractable. 


Furthermore,  this  by  no  means  exhausts  the  possibilities  for  choosing  tangent  cone  generators  when 
nonredundant  constraints  are  linearly  dependent.  Considering  that  the  projection  method  measures  distance 
to  each  constraint  boundary,  one  promising  alternative  is  to  select  the  closest  n  —  1  constraints  (with  ties 
broken  arbitrarily),  plus  one  more  constraint  obtained  by  either  random  or  sequential  selection.  The  latter 
constraint  allows  the  theory  in  the  previous  two  theorems  to  hold,  while  offering  an  intelligent  heuristic  in 
selecting  those  constraints  that  are  closer  to  the  current  iterate.  Choosing  the  closest  constraints  is  equivalent 
to  reducing  e  at  each  iteration  so  that  fewer  constraints  are  flagged  as  e-active. 


4.4.  An  algorithm  for  constructing  the  set  of  generators.  In  this  section,  we  present  an  algorithm 
for  constructing  a  set  of  generators  for  the  cone  K°(xk,£ )  at  the  current  iterate  x k  for  a  given  parameter  e. 


4.4.1.  Comments  on  the  algorithm.  The  algorithm  consists  of  two  main  parts.  In  the  first  part, 
we  determine  the  set  of  indices  of  the  nonredundant  e-active  constraints  I/v(xfc,£)  C  I(xk,c)  and  form 
the  matrix  Bn  whose  columns  are  the  columns  of  A  corresponding  to  the  indices  in  7/v(xfc,e).  We  use 
information  about  the  set  //v(xfc,e)  from  the  previous  iterations  of  the  GPS  algorithm.  Namely,  we  put 
into  the  set  //v(xfc,e)  all  indices  that  correspond  to  the  e-active  constraints  at  the  current  iterate  and  that 
were  detected  as  indices  of  the  nonredundant  constraints  at  the  previous  iterations  of  the  algorithm.  In  the 
second  part  of  the  algorithm,  we  construct  the  set  of  generators  Dk  required  by  GPS  and  by  Theorem |2.4| 


First,  we  try  to  identify  the  nonredundant  active  constraints.  If  the  matrix  Bk  defined  by  (4.1 1  has  full 
rank,  then  all  e-active  constraints  are  nonredundant,  /Ar(xfc,e)  =  /(xfc,e),  and  Bn  =  Bk-  If  the  matrix  Bk 
does  not  have  full  rank  and  we  have  indices  that  have  not  been  classified  at  the  previous  iterations  of  the 
algorithm,  we  propose  using  two  steps  in  succession. 
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The  first  strategy  is  intended  to  determine  nonredundant  constraints  cheaply  by  applying  the  projection 


method  described  in  section  3.3.1  By  Proposition  3.16  if  the  projection  Pj(xk)  of  the  current  iterate  Xk 


onto  the  hyperplane  Hj  =  {x  £  :  ajx  =bj}  is  feasible,  and  only  the  jth  constraint  holds  with  equality 

at  Pj{xk),  then  the  _jth  constraint  is  nonredundant,  and  we  can  put  index  j  into  the  set  Iisr(xk-,s).  If  some 
constraints  have  not  been  identified  by  the  projection  method,  we  can  either  apply  the  projection  method 
with  some  other  point  x  Xk  or  apply  the  second  strategy. 


The  second  strategy  is  intended  to  classify  redundant  and  nonredundant  constraints  among  those  that 
have  not  already  been  determined  as  nonredundant  by  the  projection  method.  To  identify  each  constraint, 
the  approach  outlined  in  |10]  and  in  Section  3.15  is  applied.  If  the  number  of  constraints  to  be  identified  is 
too  large,  we  can  skip  an  application  of  this  strategy  and  construct  a  set  of  generators  using  the  set  £) 

obtained  from  the  first  strategy.  Then,  while  performing  the  poll  step,  if  we  find  some  point  x  =  Xk  +  A d, 


Jx  >  bj 


where  d  is  some  column  of  Dk,  such  that  a 
that  f l{xk,e)  C  ttj(xk,e)-  Hence,  by  Corollary 
the  set  I]si{xk,e). 


3.13 


and  ajx  <  bj  for  all  i  £  I(xk,e)\{j},  we  can  conclude 
the  jth  constraint  is  nonredundant,  and  we  add  j  to 


Once  we  have  specified  all  redundant  and  nonredundant  constraints,  we  compose  the  matrix  Bn  of 
those  columns  of  A  that  correspond  to  nonredundant  constraints.  The  rank  of  Bn  can  be  determined  by 
QR  factorization.  If  B n  has  full  rank,  then  we  construct  the  set  of  generators  using  QR  or  LU  factorization. 
If  Bn  does  not  have  full  rank,  we  construct  the  set  of  generators  from  a  set  of  linearly  independent  columns 
of  Bn,  and  as  the  iteration  sequence  progresses,  we  invoke  one  of  the  methods  described  in  Section  |4~3]  to 
ensure  that  all  maximally  linearly  independent  subsets  get  used  infinitely  often. 


4.4.2.  Algorithm.  We  denote  the  set  of  indices  of  the  nonredundant  e-active  constraints  by  In{xic ,  e). 
Thus,  for  j  £  I(xk,e), 


1.  if  j  £  In{xi~,£),  then  the  inequality  ajx  <  bj  is  nonredundant;  and 

2.  if  j  £  I(xk,  e)\/jv(*fe,  e),  then  the  inequality  ajx  <  bj  is  redundant. 

We  use  In  C  /  to  denote  the  set  of  indices  that  are  detected  as  nonredundant  at  some  iteration  of  the 
algorithm.  Thus  In  =  0  in  the  beginning  of  the  algorithm. 


We  denote  the  rational  matrix  in  (1.2 1  by  AT  and  the  scaled  matrix  defined  in  (3.5)  by  AT .  The  matrix 
Bk  is  defined  by  (4.1)  and  is  composed  of  columns  aj  of  A ,  where  j  £  I(xk,e),  while  the  matrix  Bn  is 
composed  of  those  columns  of  A  whose  indices  are  in  the  set  lN{xk,e).  Thus,  the  columns  of  Bn  are  those 
vectors  normal  to  the  nonredundant  constraints. 


Algorithm  for  constructing  the  set  of  generators  D)~. 
Let  the  current  iterate  Xk  £  Kn  and  a  parameter  s  >  0  be  given. 

%  Part  I:  Constructing  the  set  lN(xk,e) 


%  Construct  the  working  index  set  I(xk,e) 

for  i  =  1  to  |  J| 

if  0  <  bj  —  dj Xk  <  e 
I(xk,£ )  <-  I(xk,e )  U  {i} 
ddk  *  [-Hfc;  cq] 

endif 

endfor 


if  rank(Bfc)  =  \I(xk,e)\ 


%  the  case  where  all  constraints  are  nonredundant 


else 


%  using  information  from  the  previous  iterations  of  the  algorithm 
for  each  j  e  {I(xk,  e)  f|  In} 

In (xk ,  e)  <-  IN (xk ,  e)  U  { j} 

Bn  [Bn,  %] 

endfor 


%  Identification  of  the  nonredundant  and  redundant  constraints 
for  each  j  in  {l(xk,  e)\IN(xk,  e)} 

%  first  strategy 

Pj(xk )  =  Xfc  +  aj(bj  —  ajxk )  %  see  Lemma  3.4 

if  dfPj(xk)  <  hi  for  all  i  £  I\{j} 
lN{xk,e)  <-  IN{xk,e)  U  {j} 

Bn  <—  [Bn,  af] 

In  ^  In^I  {j} 

else 


%  the  second  strategy 
solve  LP  problem  (3.151  for  x* 


if  aj x*  <  bj  %  the  jth  constraint  is  redundant 

remove  a3x  <  bj  from  f 1 

I-I\{j} 

I{xk,e)  <-  I(xk,e)  \  {j} 

else  %  the  j  th  constraint  is  nonredundant 

IN{xk,e)  <-  IN{xk,z)  U  {j} 

Bn  <—  [Bn,  aj] 

In  <—  In'J  {j} 

endif 

endif 

endfor 


endif 


%  Part  II:  Constructing  the  set  of  generators  Dk 

r  =  rank  (Bn) 

if  r  \IN(xk,  e)|  %  degenerate  case 

Bn  <—  B:  where  B  is  composed  of  r  linearly  independent  columns  of  Bn 

endif 

V=Bn 

Di  4-  P(PTP)-1 

D2  I  -V(VTV)-1VT 

D  =  [Di,  D2,  —Di,  —D2] 

As  discussed  in  Section  [4. 2|  the  construction  of  the  directions  in  D,  in  practice,  can  be  done  making  use 
of  either  LU  decomposition,  as  suggested  by  Lewis  and  Torczon  di,  or  by  the  more  efficient  QR  factorization 
approach  presented  in  Section  |4~2|  In  the  latter  case,  D\  and  D2  are  computed  according  to  Corollary  |4.5| 

We  should  point  out  that,  in  practice,  the  choice  of  e  can  have  a  significant  affect  on  numerical  perfor¬ 
mance.  If  the  value  is  set  too  low,  then  the  mesh  size  may  become  very  small  before  appropriate  conforming 
directions  are  generated.  If  this  happens,  the  algorithm  may  then  progress  along  a  new  conforming  direction, 
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but  with  the  significantly  reduced  mesh  size,  resulting  in  a  lot  more  function  evaluations.  On  the  other  hand, 
too  large  a  value  may  mark  too  many  constraints  as  active.  This  could  result  in  otherwise  good  directions 
being  replaced  by  worse  ones,  and  even  a  false  detection  of  degeneracy,  resulting  in  additional  unnecessary 
function  evaluations. 


4.4.3.  Numerical  Tests.  To  test  the  algorithm,  we  formed  five  test  problems  with  varying  numbers 
of  variables  and  redundant  linear  constraints  to  test  the  ability  of  our  approach  to  accurately  construct  the 
set  Ipf(xk,s)  of  nonredundant  constraints.  In  doing  so,  we  chose  a  trial  point  Xk  close  to  several  of  the 
constraints  and  tested  the  ability  of  our  algorithm  to  identify  the  nonredundant  ones.  The  test  problems  are 
described  as  follows: 


Test  Problem  1:  Same  as  the  problem  given  in  (4.2 1,  with  a  test  value  of  Xk  =  (0.1, 0.1, 0.1)y  . 

Test  problem  2:  The  following  problem  with  a  test  value  of  Xk  =  (0.01,  —0.01,  —0.01,  —0.00001,  0.01)T: 


—  Xi  +  X2 

< 

0 

X\  +  X2 

< 

1 

X2  +  %3  +  £4 

< 

0 

—  X2  +  x5 

< 

5 

-Xi  +  x2 

< 

0 

£3 

< 

0 

O.8.T1  +  x2  +  x3 

< 

0 

Test  Problem  3:  The  following  problem  with  a  test  value  of  Xk  =  (0.01, 0.01, 0.01, 0.01,  0.01)T: 


X\  —  2x2  —  2a,’3  +  X4  <0 
— 2a;  1  +  X2  —  2x3  <  0 

— 2a;  1  —  2x2  +  £3  <0 

— x\  <  0 

-X'2  <  0 

X3  -  x5  <0 
— X4  —  0.1x5  <  0. 


Test  Problem  4:  Same  as  Test  Problem  3,  but  with  Xk  =  (0.000001,  0.000001, 0.000001, 0.000001,  0.1)T. 
Test  Problem  5:  Same  as  Test  Problem  3,  but  with  Xk  =  (0.001, 0.001, 0.001, 0.001, 0.001)T. 


We  report  results  in  Table  4. 1  where  each  row  corresponds  to  one  of  the  five  test  problems  (in  the 
order  presented),  and  where  the  number  of  variables  is  given  in  the  first  column.  Columns  2  and  3  show 
the  number  of  nonredundant  and  redundant  constraints,  respectively,  with  their  sum  representing  the  total 
number  of  constraints  for  each  problem.  The  last  two  columns  indicate  how  many  of  the  constraints  were 
identified  as  nonredundant,  first  by  the  projection  method,  and  then  by  the  LP  approach  if  projection  failed 
to  identify  all  the  nonredundant  ones.  As  is  shown  in  the  table,  the  projection  method  identifies  most  of  the 
nonredundant  constraints,  and  with  the  LP  method  as  a  backup,  all  the  constraints  are  correctly  identified. 


With  this  approach  in  place,  the  number  of  GPS  iterations  required  for  a  problem  with  no  redundant 
constraints  will  be  no  different  than  for  a  modified  version  of  the  same  problem,  in  which  any  number  of  ad¬ 
ditional  redundant  constraints  are  added,  since  the  algorithm  detects  and  removes  the  redundant  constraints 
at  each  iteration. 


Finally,  we  coded  up  the  random  selection  and  sequential  selection  approaches  for  handling  linearly  de¬ 
pendent,  nonredundant  e-active  constraints  (see  Section  4.3 1,  added  this  code  to  the  NOMADm  software  p]), 
and  tested  the  approaches  on  the  following  test  problem: 


Test  problem  6: 


min  —  x\  —  x\ 

X 


-*3 
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Table  4.1 

Constructing  the  set  If^(xk,e)  at  the  current  iterate  Xk 


Constraints 

Detected  as 

nonredundant 

Variables 

Nonredundant 

Redundant 

by  Projection 

by  LP  approach 

3 

6 

0 

6 

5 

6 

1 

5 

1 

5 

7 

0 

6 

1 

5 

7 

0 

5 

2 

5 

7 

0 

6 

1 

subject  to 

4a; 

i  +  4a;2  - 

-  3a;3 

> 

16a; 

i  +  8a;2  - 

-  9a;3 

> 

24a;  i 

+  8a;2  - 

lla;3 

> 

8aq  - 

-  24a;2  + 

23a;3 

> 

8aq 

+  8a;2  - 

13a;3 

< 

24a;  i 

+  8a;2  - 

25a;3 

< 

24a;  i 

-  8a;2  - 

21a;3 

< 

X\  +  2a;2 

-  %3 

> 

X\  +  x2 

+  %3 

< 

Test  Problem  6  has  a  degenerate  global  maximizer  at  the  origin,  which  is  not  the  local  solution.  We  start 
the  algorithm  there,  and  in  order  to  avoid  stalling  there,  a  set  of  n  =  3  constraints  must  be  chosen  that 
will  generate  a  feasible  descent  direction.  Since  the  algorithm  does  not  evaluate  the  objective  at  infeasible 
points,  and  the  cones  of  feasible  directions  and  of  descent  directions  coincide  at  this  point,  moving  off  the 
degenerate  point  will  always  occur  at  the  second  function  evaluation.  Thus  our  measure  of  performance 
becomes  the  number  of  iterations,  rather  than  function  evaluations,  required  to  move  off  the  degenerate 
point.  The  number  of  iterations  gives  a  measure  of  how  many  unsuccessful  attempts  the  algorithm  made  at 
including  a  feasible  descent  direction  in  its  set  of  poll  directions. 


For  sequential  selection,  we  paired  the  constraints  in  consecutive  order;  i.e.,  constraints  1  and  2,  1  and  3, 
1  and  4,  etc.  The  algorithm  required  9  iterations  to  move  off  of  the  degenerate  point.  For  random  selection, 
we  performed  10  replications  and  achieved  the  following  number  of  iterations  to  move  off  the  degenerate 
point:  {4, 3, 4, 2,  2,  2, 5,  7, 2,  6}. 


We  tested  both  approaches  with  default  settings  of  A0  =  1,  a  mesh  refinement  strategy  of  A^+i  =  |Afe, 
and  empty  search  step.  The  variable  e  was  set  to  10-4  (although  this  had  no  bearing  on  performance, 
since  we  started  at  the  degenerate  point),  and  the  QR  factorization  was  used  in  constructing  tangent  cone 
generators.  Both  direction  selection  approaches  successfully  moved  off  the  degenerate  point.  A  comparison 
between  the  two  approaches  is  not  particularly  relevant,  since  the  result  for  sequential  selection  is  highly 
dependent  on  the  order  in  which  constraints  are  expressed  or  chosen.  In  the  worst-case,  since  each  iteration 
required  2n  =  6  directions,  consideration  of  between  12  and  54  directions  were  required  to  move  off  the 
degenerate  point.  However,  had  we  attempted  to  compute  all  the  directions  during  the  first  iteration,  we 
might  have  had  to  consider,  by  (4.7 1,  a  worst-case  84  subsets  of  constraints,  or  504  directions. 


As  an  aside,  we  also  let  the  algorithm  run  to  completion  (termination  tolerance  of  A/.  <  10-8),  and 
all  11  runs  successfully  converged  to  one  of  two  local  minimizers  (at  approximately  (3.67,  0.221, 4. 11)T  and 
(0.471,  3.76,  3.76)t).  Sequential  selection  required  a  total  of  67  iterations  and  78  function  evaluations,  while 
random  selection  required  55-91  iterations  and  84-144  function  evaluations. 
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We  should  point  out  that,  for  this  problem,  there  is  no  real  cost  savings  in  terms  of  function  evaluations 
because  the  infeasible  points  were  not  evaluated.  Had  we  chosen  a  problem  in  which  the  initial  point  was 
degenerate,  but  neither  a  minimizer  or  maximizer,  then  the  number  of  function  evaluations  would  become  a 
factor.  In  this  case,  the  selection  of  certain  combinations  of  constraints  would  generate  feasible  non-descent 
directions,  which  would  result  in  additional  function  evaluations  at  points  with  worse  function  values.  But 
even  without  the  cost  savings,  we  still  avoid  the  potentially  intractable  task  of  specifically  identifying  all  the 
tangent  cone  generators  at  each  iteration  via  some  vertex  enumeration  scheme. 

5.  Nonlinearly  constrained  minimization.  The  goal  of  this  section  is  to  illustrate  how  the  projec¬ 
tion  approach  proposed  in  this  paper  can  also  be  effective  for  handling  degeneracy  in  nonlinearly  constrained 
optimization  problems.  In  doing  so,  we  should  point  out  that  our  approach  is  different  than  that  of  [20]  and 
[26]  (and  others  cited  in  these  papers).  Both  approaches  use  local  information  about  the  (twice  continuously 
differentiable)  objective  and  constraint  functions  to  identify  active  constraints.  Moreover,  the  focus  in  [26] 
is  on  distinguishing  between  strongly  and  weakly  active  constraints  -  the  latter  having  Lagrange  multiplier 
values  of  zero.  In  our  case,  we  do  not  have  multiplier  values  available,  and  even  if  we  did,  most  direct  search 
methods  we  might  consider  using,  such  as  |12j  and  m,  can  handle  weakly  active  constraints  transparently 
if  constraint  gradients  are  linearly  independent. 

We  consider  the  nonlinearly  constrained  optimization  problem 

min  /( x)  subject  to  x  £  =  {x  £  R"  :  Ci{x)  <0,  i  =  1,. . .  ,q}.  (5.1) 

All  constraint  functions  Cj,  i  =  1,  2, . . . ,  q,  are  assumed  to  be  continuously  differentiable,  but  their  gradients 
may  not  be  available.  The  algorithm  in  m  uses  constraint  gradients,  while  the  one  in  [T2J  uses  only 
approximations.  Our  intent  is  to  be  as  general  as  possible,  so  that  the  ideas  presented  here  might  be 
extendable  to  both  algorithms,  as  well  as  other  direct  search  methods. 

Similar  to  Section  [3]  the  region  defined  by  all  but  the  j-th  constraint  is  given  by 

Oj  =  {i  e  R"  :  Ci(x)  <  0,  i  £  I\{j}}, 

where  I  =  {1,2,...,  q}.  Additionally,  for  5  >  0,  we  define  U$(x)  =  {y  £  Rn  :  \\y  —  x||  <  <5} ,  and  offer  a 
definition  of  local  redundancy  (nonredundancy) ,  in  the  sense  that  the  constraints  are  locally  nonredundant 
if  they  define  the  shape  of  the  feasible  region  in  some  neighborhood  of  a  point  x  £  Rra.  This  is  illustrated  in 
Figure  |5.1| 


Fig.  5.1.  An  illustration  of  a  locally  redundant  constraint.  Constraint  2  is  locally  redundant  at  x 

Definition  5.1  (Locally  redundant  constraint).  The  jth  constraint  Cj(x)  <  0  is  locally  redundant  at  x 
in  the  description  of  if,  for  some  S  >  0,  fl  U$(x )  =  flj  D  Us(x),  and  is  locally  nonredundant  otherwise. 

Our  main  interest  is  in  the  problem  of  constructing  search  directions  that  conform  to  the  boundary  of 
f l.  First,  we  define  constraint  j  as  e-active  if  —e  <  Cj(x )  <  0.  For  the  iterate  Xk  at  iteration  k,  we  denote 
by  I(xk,e)  the  set  of  indices  of  the  e-active  constraints  at  Xk]  namely, 

I{xk,e)  =  {j  =  1,2, .  ..,q  :  -e  <  c0{ xk)  <  0}, 
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and  extend  the  following  from  similar  definitions  given  in  Section  [3] 

fl{xk,e)  =  {ieR":  Ci(x)  <  0,  i  £  I(xk,e)}, 

Qj(xk,e)  =  {ieR":  Ci(x )  <  0,  i  £  I(xk,e)\{j}},  j  £  I(xk,e). 


If  xk  is  close  to  the  boundary  of  ft,  then  the  set  of  directions  should  contain  generators  for  the  tangent 
cone  Tn(xk)  for  boundary  points  near  xk. 

We  assume  that  estimates  of  the  gradients  Vci(xk),  i  =  1,  2, . . . ,  q,  are  available.  Thus,  an  estimate 
C(-k\l(xk,e))  of  the  tangent  cone  Tn{xk)  is  given  by 


C(fc)(J (®fc,e))  =  {u  £  Rn  :  vTaf]  <0  V*  £  I(xk,e)}.  (5.2) 


By  Definition  |4.1|  each  of  these  cones  can  be  expressed  as  the  set  of  nonnegative  linear  combinations  of  a 
finite  number  of  generators,  {^}^=1  C  Rn. 

One  of  the  main  assumptions  in  [T2j  is  that  at  each  point  x  on  the  boundary  of  ft,  the  gradients  of  the 
constraints  active  at  x  are  linearly  independent.  By  extending  our  ideas  from  previous  subsections  to  the 
nonlinear  case,  this  assumption  can  be  relaxed. 

The  next  proposition  is  simply  an  application  of  Proposition  |4.2|  to  the  cone  defined  by  the  linearized 
constraints,  except  that  the  index  sets  apply  to  nonlinear  constraints.  As  a  consequence,  it  is  sufficient  to 
construct  the  set  of  generators  for  only  the  locally  nonredundant  constraints. 


Proposition  5.2.  Let  lN(xk,e)  C  I(xk,e)  be  the  subset  of  indices  of  the  locally  nonredundant  con¬ 
straints  that  define  fl(xk,s).  Let  the  cone  C^k\l{xk,e))  be  defined  by  (5.2 1  and  let  the  cone  C be  given 


by  C =  {u  £  Rn  :  vTa\h'>  <0  Vi  £  /jv(xfc,£)}-  V  {^l,  •  •  • ,  vp}  is  a  set  of  generators  for  C 
also  a  set  of  generators  for  C^k\l(xk,e)). 

Proof.  This  follows  directly  from  Corollary  |3.13|  □ 

To  extend  the  projection  approach  described  in  Section  3.3.1|for  detecting  locally  nonredundant  nonlinear 
constraints,  we  simply  project  onto  a  linearization  of  the  constraint  boundary,  based  on  approximations  to  the 
constraint  gradients  at  the  current  iterate;  i.e.,  we  project  onto  the  hyperplane  Hj  =  {z>  £  R"  :  vT  ak>  =  0}. 
Scaling  the  constraints  similar  to  (3.5 1  and  applying  Lemma  3.4  yields  a  projection  equation  similar  to  ( 3. 10|) ; 
namely, 


N 

(k) 

N 


then  it  is 


Pj(xk)  =  xk  +  af)Cj(xk),  af'1 


j  =  1,2,...,  9- 


(5.3) 


(k) 

If  the  generators  of  CyN  at  iteration  k  are  linearly  independent,  then  they  would  all  be  included  in  the 
set  of  search  directions  for  that  iteration.  Otherwise,  the  set  of  search  directions  would  include  a  maximal 
linearly  independent  subset  of  the  generators,  selected  in  exactly  the  same  manner  as  discussed  in  Section [473] 


We  omit  a  formal  discussion  of  convergence,  since  any  results  would  be  dependent  on  the  algorithm  being 
used  and  on  the  details  of  its  implementation.  However,  it  appears  safe  to  assume  that  any  convergence 
results  will  require  a  certain  degree  of  accuracy  by  the  vectors  a)  ;  as  approximations  to  the  constraint 
gradients  Vcj(xk). 


We  view  these  ideas  as  a  natural  extension  of  those  of  Section  3.3.1  and  one  that  can  achieve  a  significant 
cost  savings  over  the  LP  approach.  Recall  from  Section  |3. 3. 2|that  the  expense  of  the  LP  approach  for  linear 
constrained  problems  can  be  circumvented  by  performing  it  before  the  algorithm  commences,  since  the 
redundancy  of  each  constraint  is  independent  of  the  location  of  the  current  iterate.  However,  this  is  not  true 
for  nonlinear  constraints,  in  which  case,  the  LP  approach  would  have  to  be  performed  at  every  iteration, 
which  is  considerably  more  expensive  than  projection. 
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6.  Concluding  remarks.  This  paper  fills  an  important  gap  in  the  pattern  search  literature,  comple¬ 
menting  the  previous  work  of  Lewis  and  Torczon  m  by  rigorously  treating  the  case  of  degenerate  linear 
constraints.  We  have  introduced  an  inexpensive  projection  method  for  identifying  nonredundant  constraints, 
which,  when  used  in  conjunction  with  a  linear  programming  approach  as  a  backup,  can  cheaply  assess  the 
redundancy  of  each  constraint,  and  thus  aid  pattern  search  in  computing  directions  that  conform  to  the 
boundary  of  the  feasible  region.  For  the  case  in  which  nonredundant  e-active  constraints  are  linearly  de¬ 
pendent,  we  avoid  complete  enumeration  of  tangent  cone  generators  by  including  only  a  subset  of  them, 
and  then  changing  them  at  each  iteration,  such  that  all  are  used  infinitely  often  over  the  entire  iteration 
sequence.  We  prove  first-order  convergence  in  this  case,  under  an  additional  mild  assumption.  Finally,  we 
have  shown  how  our  ideas  can  be  extended  to  nonlinearly  constrained  optimization  problems  under  similar 
degenerate  conditions. 
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